Optimal two-stage genome-wide association designs based on false discovery rate

نویسندگان

  • Hansong Wang
  • Daniel O. Stram
چکیده

Genome-wide association studies are likely to be conducted in large scale in the near future. In such studies, searching over hundreds of thousands of markers for the few ones that are associated with disease brings out the multiple-hypothesis testing problem in its severe form. We explore, in a two-stage design, how the use of false discovery rate (FDR) can alleviate the burden of a prohibitively strict significance level for single marker tests and still control the number of false positive findings, when there is more than one causal variant. FDR is the expected proportion of false positives among all significant findings. It can be approximated by (1-p0) /[(1-p0) + p0(1)], where p0 is the proportion of true causal markers, is the type I error rate and 1the power of a two-stage study. When 500,000 SNPs are genotyped in the first stage with fixed SNP array and the most significant SNPs are genotyped in the second stage with standard but 20 times more expensive high-throughput techniques, up to 20% savings in the minimum genotyping cost is achieved for p0 in the range of 10 −5 to 5 × 10−4 and FDR in the range of 0.05 to 0.7, compared to when Bonferroni-corrected significance level is used. In terms of sample size, the saving is up to 60%. However, these savings come at a cost of more false positive findings. © 2006 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G...

متن کامل

Optimal False Discovery Rate Control for Dependent Data.

This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is s...

متن کامل

Optimal designs for two-stage genome-wide association studies.

Genome-wide association (GWA) studies require genotyping hundreds of thousands of markers on thousands of subjects, and are expensive at current genotyping costs. To conserve resources, many GWA studies are adopting a staged design in which a proportion of the available samples are genotyped on all markers in stage 1, and a proportion of these markers are genotyped on the remaining samples in s...

متن کامل

Two-stage designs for experiments with a large number of hypotheses

MOTIVATION When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the 'promising' hypotheses which are further investiga...

متن کامل

Programs for calculating the statistical powers of detecting susceptibility genes in case–control studies based on multistage designs

MOTIVATION A two-stage association study is the most commonly used method among multistage designs to efficiently identify disease susceptibility genes. Recently, some SNP studies have utilized more than two stages to detect disease genes. However, there are few available programs for calculating statistical powers and positive predictive values (PPVs) of arbitrary n-stage designs. RESULTS We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2006